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27 Abstract 

28 The temperate-tropical division of early maize germplasm to different agricultural 

29 environments was arguably the greatest adaptation process associated with the success 

30 and near ubiquitous importance of global maize production. Deciphering this history is 

31 challenging, but new insight has been gained from the genomic, transcriptomic and 

32 phenotypic variation collected from 368 diverse temperate and tropical maize inbred 

33 lines in this study. This is the first attempt to systematically explore the mechanisms of 

34 the adaptation process. Our results indicated that divergence between tropical and 

35 temperate lines seem occur 3,400-6,700 years ago. A number of genomic selection 

36 signals and transcriptomic variants including differentially expressed individual genes 

37 and rewired co-expression networks of genes were identified. These candidate signals 

38 were found to be functionally related to stress response and most were associated with 

39 directionally selected traits, which may have been an advantage under widely varying 

40 environmental conditions faced by maize as it was migrated away from its 

41 domestication center. It's also clear in our study that such stress adaptation could 

42 involve evolution of protein-coding sequences as well as transcriptome-level 

43 regulatory changes. This latter process may be a more flexible and dynamic way for 

44 maize to adapt to environmental changes over this dramatically short evolutionary time 

45 frame. 
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46 Introduction 

47 Maize {Zea mays ssp. mays) is essential to the global food supply, with current 

48 total maize grain production higher than any other crop (USDA FAS 2013). Maize is 

49 also used as a model to investigate crop evolution and improvement (Doebley et al. 

50 2006). It is thought to have been domesticated from teosinte {Zea mays ssp. 

51 parviglumis) about 9,000-10,000 years ago in southwestern Mexico, which is a mid- to 

52 lowland tropical growing environment (Matsuoka et al. 2002; Van Heerwaarden et al. 

53 201 1). The remarkable conversion of a Mexican annual grass species into the top food, 

54 feed and industrial crop in the world resulted from the spread of temperate maize over 

55 several thousand years from its tropical geographic origin to the north and east across 

56 North America and to the south across most of Latin America, eventually creating a 

57 maize distribution from ~40°S in Chile to -45 °N in Canada (Matsuoka et al. 2002). 

58 Centuries ago, maize cultivation expanded further to East Asia, Europe, and Africa, 

59 and the temperate -tropical division remains in all crop-growing continents today. When 

60 faced with widely varying temperate conditions in temperature, day length, and disease 

61 susceptibility, maize adapted remarkably well. One major goal of adaptation studies is 

62 to identify specific genomic changes contributing to advantageous phenotypic 

63 performance in varying environmental conditions. 
64 

65 In order to identify the genetic factors driving maize evolution, researchers have 

66 explored a number of methods to reveal footprints of selection within the genome 

67 (Chia et al. 2012; Hufford et al. 2012; Jiao et al. 2012). It is intriguing that these 

68 changes occurred within such a short evolutionary time frame. The importance of 

69 transcriptional regulation of rapid phenotypic evolution has been a central tenet of 

70 recent studies (Swanson-Wagner et al. 2012; Carroll 2008; Ecker et al. 2012; Koenig et 

71 al. 2013). Genes with differential expression and altered expression networks could 

72 provide evidence of the contribution of trans crip tome regulational changes to the 

73 adaptation process. RNA-seq (Wang et al. 2009) allows cost-effective exploration of 

74 both sequence and transcriptional variation, particularly in large and repetitive 

75 sequence-rich genomes such as maize. 
76 

77 Seed development, a critical process to both plant propagation and food supply, is a 

78 time in which DNA methylation and chromatin remodeling, and thus transcriptional 

79 patterns, are reshaped for the new generation (Ahmad et al. 2010; Wollmann et al. 

80 2012; Zanten et al. 201 1). Transcriptional variation may thus heavily influence 
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81 seed-related traits via environmentally-sensitive epigenetic control (Zhang and Ogas 

82 2009), which will be expressed as selectable variation throughout the lifetime of the 

83 plant (Kapazoglou et al. 2013; Casas et al. 2012). Most maize genes are expressed in 

84 seed or embryos, many of which are not expressed again (Cho et al. 2012; Sekhon et al. 

85 201 1). Thus, the seed offers the best window into visualizing differences that may 

86 account for adaptation. 
87 

88 To study the nature of maize adaptation from tropical to temperate growing regions, a 

89 panel of 368 diverse maize inbred lines (Li et al. 2013) (Supplemental Table SI) was 

90 characterized. We combined RNA-seq of seeds (15 days after pollination; Fu et al. 

91 2013) with data from the MaizeSNP50 BeadChip, resulting in over one million 

92 high-quality SNPs and expression data from 28,769 genes, analyzed together with 662 

93 phenotypic traits. These included morphological, agronomic, physiological and 

94 metabolic traits, many of which are also known to be important in stress adaptation 

95 (Bohnert and Sheveleva 1998; Bhargava and Sawant 2013). This study is the first 

96 systematic exploration of the mechanisms of the maize adaptation process with the 

97 goal of answering several specific questions: What phenotypic changes in temperate 

98 lines convey an advantage in novel environments? Which genomic regions were 

99 selected during the adaptation process? What phenotypes do these regions likely affect? 

100 To what extent do regulatory changes contribute to evolution? What beneficial value 

101 do they provide in the relatively short evolutionary time frame suggested in this study? 

102 Although only one organ (the seed) was sequenced, such knowledge will position us 

103 with general understanding of the maize adaptation process, and provide resources for 

104 developing breeding strategies to help corn producers cope with an increasingly erratic 

105 climate. 
106 

107 Results 

108 Population level differences between temperate and tropical lines 

109 The population-scaled recombination rates (p) in temperate and tropical lines were 

110 1 .078/kb and 2.644/kb, respectively. This is a reflection of different rates of LD decay, 

111 which was much faster in the tropical lines at the whole genome level (Supplemental 

112 Fig. SI). Recombination rate differences in temperate vs. tropical lines was smaller 

113 than the decrease in p seen in a previous study by Hufford et al. (2012) in landraces 

114 compared to teosinte (59% vs. 75%). A cross population composite likelihood approach 

115 (Chen et al. 2010) (XP-CLR) was used to identify extreme allele frequency 
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116 differentiation over linked regions when comparing temperate to tropical 

117 subpopulations. We identified 701 regions containing 1660 selected genes at the 

118 highest 10% of XP-CLR values (Fig. 1, Supplemental Table S2,3), ranging in size from 

119 lOkb to 2,320kb, with an average of 150.9kb; this is shorter than the 322kb average 

120 region associated with domestication (Hufford et al. 2012). The combined length of 

121 selected regions was 105.7Mb, covering 5.2% of the genome. The selection coefficient 

122 in the adaptation process was 0.090, which is higher than domestication (0.015) and 

123 improvement (0.003) (Hufford et al. 2012), indicating a stronger selection pressure 

124 during the adaptation process. However, the coefficient and size of the genomic region 

125 associated with selection may not be directly comparable with adaptation, since most 

126 polymorphisms in the current study are based on expressed genes, compared to random 

127 sequence polymorphisms measured by Hufford et al. (2012) 
128 

129 Nucleotide diversity (n) in the selected features identified by XP-CLR in temperate vs. 

130 tropical lines was 8.22E-04 and 8.42E-04, respectively, indicating a reduction of 2.4% 

131 in temperate lines. This decrease is less than the reduction of nucleotide diversity in 

132 selected features identified during domestication (17%) by Hufford et al. (2012). F st of 

133 selected features was 0.027, compared to 0.1 1 between teosinte and landraces, possibly 

134 due to the shorter time for adaptation from tropical to temperate than for domestication. 

135 However, this result is similar to 0.02 reported between landraces and improved lines 

136 (Hufford et al. 2012). 
137 

138 The divergence time between temperate and tropical subpopulations is of interest and 

139 can be associated with the development of agriculture and the spread of human 

140 civilization in the Americas; however, archeological information on this topic is 

141 incomplete and occasionally contradictory (Piperno and Pearsall 1998; Staller and 

142 Thompson 2002; Blake et al. 2006; Grobman et al. 2012). We proposed three models 

143 (detailed in the methods section, Fig. 2A-C) to estimate the time of divergence, 

144 resulting in the estimation of 3,400-6,700 years BP. This time frame is supported by 

145 recent archeological evidence (Haas et al. 2013) and implies that after domestication, 

146 maize cultivation rapidly expanded to temperate America (Fig. 2D). The molecular 

147 evidence thus suggests that improvement and adaptation here may not have been 

148 sequential and discrete processes, but overlapped in maize. Although gene flow 

149 between maize and its wild relatives has been shown to be of adaptive importance for 

150 maize evolution (Van Heerwaardena et al. 201 1 ; Hufford et al. 2013), and has been 
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151 measured in tropical maize (Warburton et al. 2011), it has been difficult to measure the 

152 rate or effect of gene flow in temperate lines following divergence; thus, we do not 

153 factor it into our analysis. 
154 

155 Genome-wide selection analysis and functional correspondence 

156 The 701 selected regions were compared to 466 and 573 regions identified in 

157 domestication and improvement previously, respectively (Hufford et al. 2012; 

158 Supplemental Fig. S2). Seven regions were identified in all three processes, and may 

159 play a similar role in domestication and post-domestication, contributing to the unique 

160 phenotypes of temperate maize. Most adaptation regions did not overlap with 

161 domestication and improvement at all, indicating different genomic factors 

162 contributing to the phenotype changes during adaptation. 
163 

164 Gene Ontology (GO) annotation of the top 571 candidate genes (see Materials and 

165 Methods) within the 701 selected regions reflected genes responding to stress, 

166 development, and metabolic processes (Supplemental Table S4). MFT (Mother of FT 

167 and TFL1 , GRMZM2G059358), was identified as a strong candidate gene in 

168 adaptation (Fig. 1). It encodes a maize MFT-like protein, which is involved in seed 

169 dormancy and germination, a complex adaptation process modulated through a 

170 negative feedback loop via ABA (Xi et al. 2010). MFT is also known to be involved in 

171 control of shoot meristem growth and flowering time (Yoo et al. 2004). Flowering time 

172 changes is critical that allow maize to adapt to different environmental conditions such 

173 as photoperiod and temperature. Another gene, GRMZM2G360455, an important locus 

174 affecting the photoperiod response based on the maize nested association mapping 

175 (NAM) population analysis (Buckler et al. 2009), and also containing the 

176 "CO/CO-Like/TOCl conserved site" (CCT), which is always contributed in circadian 

177 clock and flowering time (Robson et al. 2001 ; Griffiths et al. 2003; Cockram et al. 

178 2012), was selected during maize adaptation (Fig. 3A). Metabolic processes 

179 influencing nutritionally important traits such as starch content and oil concentration 

180 could likely be targets of selection not only during domestication and improvement but 

181 also adaptation, as different soil and climate conditions will influence developmental 

182 stage of germinating maize. Some genes associated with nutritional traits were selected 

183 during the adaptation process in our study (Fig. 1), such as su2 (GRMZM2G34855 1), 

184 zpul (pullulanase-type starch debranching enzymel, GRMZM2G158043), and sdpl 

185 (GRMZM2G087612); these have also been identified as selection targets during 
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186 domestication and improvement (Zhang et al. 2004; Beatty et al. 1999; Eastmond 

187 2006). 
188 

189 Selected metabolic pathways responding to a changing environment are also involved 

190 in the adaptation process. Glutathione plays an important role in cellular processes 

191 under biotic stress (Dubreuil-Maurizi and Poinssot 2012) and is one such example. 

192 Accordingly, gst35 (Glutathione S-transferase 35, GRMZM2G161891), gst41 

193 (Glutathione S-transferase 4 1 , GRMZM2G097989) and gshl 

194 (gamma-glutamylcysteine synthetasel, GRMZM2G1 11579), which influence 

195 glutathione metabolism, were all within our selected regions (Fig. 1). Traits related to 

196 plant architecture and vegetative growth were also selected during adaptation, and their 

197 corresponding genes were found in our top XP-CLR hits, including rs2 (rough sheath2, 

198 GRMZM2G403620), essential for normal leaf morphology (Phelps-Durr et al. 2005); 

199 bk2 (brittle stalk2, GRMZM2G109326) affecting mechanical strength of maize by 

200 altering the composition and structure of secondary cell wall tissues(Ching et al. 2006), 

201 and aptl (aberrant pollen transmission 1, GRMZM2G448687), affecting the elongation 

202 of root cortex cells and pollen tubes during temperature stress (Xu and Dooner 2006) 

203 (Fig. 1, Supplemental Table S3). 
204 

205 Phenomic changes affecting adaptation and validation of selected regions. 

206 Adaptation involves selection of different ecotypes suited to different environments, 

207 leading to measurable phenotypic differences between environments. To test this 

208 assumption, we collected 662 phenotypic data points including agronomic, yield, seed 

209 quality, and seed metabolic traits (Supplemental Tables S5, 6, 7; Wen et al. 2014). The 

210 Q ST — F ST method (Leinonen et al. 2013), by comparing complex partitioning of 

211 variation of quantitative traits and neutral molecular markers (see methods for details), 

212 allowed us to distinguish whether the divergent traits are caused by directional 

213 selection (Q ST > F ST ), genetic drift (Q ST « F ST ), or stabilizing selection (Qst < Fst) 

214 (Leinonen et al. 2013). One hundred and thirty traits displayed both divergent patterns 

215 suggestive of directional selection across the populations and significant (p<5.1E-05) 

216 differences between the tropical and temperate subgroups (Supplemental Table S8), 

217 which were likely to contribute to improved phenotypic performance in temperate 

218 conditions. 
219 

220 To test if selected regions contributed to these phenotypic changes, a genome wide 
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221 association study (GWAS) was performed on the 130 divergent traits with 24,595 

222 SNPs from the selected regions. In total, 345 regions (49.22% of the total) were 

223 associated with 100 of the 130 traits (P < 4.06E-05), including three agronomic (days 

224 to silking, cob color and kernel color), one amino acid (Ala), and 96 metabolic traits 

225 (Supplemental Table S9). The genes identified here undoubtedly represent only a 

226 fraction of the total genes selected during adaptation, since many target traits were not 

227 measured in this study. 
228 

229 Resistance to new biotic stresses was essential to the adaptation process. Gene 

230 GRMZM2G095043 falls within a selected genomic region, and likely contributes to 

231 resistance to new pathogens faced by migrating maize. This gene contains a 

232 WD40-repeat domain potentially involved in the regulation of the flavonoid pathway 

233 (Koes et al. 2005), and showed a strong association with both cob color and Tricin 

234 O-pentosyl-O-hexoside (nl570) in temperate lines in this study (Fig. 3D, E, F; 

235 Supplemental Table S9). Tricin O-pentosyl-O-hexoside is a flavonoid, and flavonoids 

236 are colored compounds that increase insect resistance (McMullen et al. 2009; Falcone 

237 etal. 2012). 
238 

239 Quantitative trait locus (QTL) mapping was used to confirm the phenotypic effect of 

240 the selected regions. As flowering time changes is representative that allow maize to 

241 adapt to temperate environmental conditions such as photoperiod and temperature. 

242 Three RIL populations generated by crossing temperate lines with tropical/subtropical 

243 lines were used to map flowering time (Supplemental Table S10). The identified QTL 

244 were reported in Supplemental Table 7, and many (1 13, or 62%) overlapped 

245 significantly (P < 2.2E-16) with the identified selected regions (Fig. 1). For example, 

246 gene GRMZM2G360455, encoding a CCT domain-containing protein was located in 

247 the QTL interval for days to silking under short day (tropical) but not under long day 

248 (temperate) conditions, and also a candidate response to flowering time of the maize 

249 NAM population (Buckler et al. 2009), was identified within a selected region (Fig. 3A, 

250 B, C). Further study of this gene may provide a better understanding of maize 

251 flowering time and adaptation. 
252 

253 Significance of transcriptional regulation to adaptation 

254 Differential expression 

255 Changes in gene regulation, impacting gene expression level but not gene structure, are 
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256 fundamental to the evolution of morphological and developmental diversity 

257 (Swanson-Wagner et al. 2012; Carroll 2008). The present datasets provide an excellent 

258 resource for investigating the contribution of transcriptome regulation to the adaptation 

259 process. The coefficient of variation (COV) was similar between tropical and 

260 temperate germplasm when considering gene expression of all genes (Fig. 4A), 

261 suggesting that most inbreds were probably at the same developmental stage and that 

262 no transcriptome-wide changes occurred between tropical and temperate lines. This 

263 agrees with a previous study (Swanson-Wagner et al. 2012), indicating that overall 

264 changes in expression did not happen during domestication and post-domestication. To 

265 study specific transcriptome signal response of individual genes or groups of genes 

266 contributing to the adaptation process, QsrF ST of differentially expressed genes were 

267 compared, including single genes differentially expressed under different conditions or 

268 in different samples (DE), and genes with altered expression conservation (AEC) 

269 (Swanson-Wagner et al. 2012), representing the rewired co-expression of a gene 

270 network. 
271 

272 Among 28,769 expressed genes, 2,700 (9.4% of the total) showed significant 

273 differential expression (posterior prob. > 0.9999); all exceeded neutral expectation 

274 (Supplemental Table Sll). Comparing between temperate and tropical lines in the 

275 public available database with the same quantitative measurement and DE analysis 

276 process, we found a major part of these DE genes were also expressed differently in 

277 other tissues , including the shoot apex (Li et al. 2012), (P =1.88E-38) and seeding L3 

278 leaf (Eichten et al. 2013), (P =1.44E-15; Supplemental Fig. S3). This is highly 

279 suggestive that most differentially expressed genes identified here were not caused by 

280 random environmental and developmental variation and that many of the candidate DE 

281 genes continued to be important to adaptive differences later in the development of the 

282 mature plant. With more relaxed posterior probability (> 0.999), 14.4% of the total 

283 genes showed expression differences, and most were still likely to have been caused by 

284 directional selection (Supplemental Fig. S4). 
285 

286 There were 871 up- and 1,829 down-regulated genes in temperate vs. tropical lines 

287 (Fig. 4B, Supplemental Table Sll). These DE candidates tend to be regulated by 

288 distant eQTLs (P=1.74E-4, eQTL data from Fu et al. 2013), and especially for 

289 up-regulated (P=9.67E-12) but not for down-regulated (P=0.83) genes in temperate 

290 maize. With local (cis-) regulation, expression differences are caused by one regulatory 
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291 region near each expressed gene, while more distant (trans-) regulation often sees the 

292 expression of whole groups of genes regulated by a single genetic factor, causing 

293 potential widespread pleiotropic effects. Thus, trans -regulatory mutations seem to be 

294 better suited to the changes of complex pheno types which are governed by the 

295 coordinated expression patterns of multiple genes and a single regulator. While some 

296 previous studies have emphasized the key role for cis-regulatory evolution (Wray et al. 

297 2007; Stern et al. 2009), our results suggest that trans -regulatory variation could 

298 contribute more commonly to adaptive phenotypic divergence. 
299 

300 Gene Ontology (GO) analysis (Fig. 4C, D) of DE genes showed an enrichment of 

301 up-regulated genes involving molecular function, especially in catalytic activity, 

302 oxidoreductase activity, endopeptidase inhibitor activity and transferase activity 

303 (Supplemental Fig. S5). Endopeptidase levels are influenced by stress in plants (Antao 

304 and Malcata 2005; Richen et al. 2003) and animals (Karlsson et al. 2006), but clear 

305 mechanisms are still unknown. The GO analyses of down-regulated genes in temperate 

306 lines revealed enrichment of genes involved in processes such as response to stimuli, 

307 metabolism, and regulation (Supplemental Fig. S5). The same GO analyses also 

308 uncovered genes involved in cellular components and molecular functions; one 

309 down-regulated protein serine/threonine phosphatase involved in both aspects was 

310 particularly significant and is associated with biotic and abiotic stress (Antao and 

311 Malcata 2005) (Supplemental Fig. S5). 
312 

313 The MADS -box family of transcription factors is important in the evolution of plant 

314 architecture and angiosperm inflorescence development, and is frequently identified as 

315 targeted regions of selection during the domestication of maize (Zhao et al. 2011). 

316 Three MADS-box genes [zmm5 (GRMZM2G148693), zmm29 (GRMZM2G1 52862) 

317 and zapl (GRMZM2G171365)] were all down-regulated in temperate lines 

318 (Supplemental Fig. S6). These genes belong to the MICK C class of the MADS-box gene 

319 family and the TM3, GLO and SQUA subfamilies, respectively, and are involved in 

320 growth and flower homeotic function (Miinster et al. 2002). The circadian clock is vital 

321 in flowering time networks which consist of three negative feedback loops. Gene 

322 TOClb (GRMZM2G 148453) is located in the central loop (Kolmos et al. 2008), and 

323 showed down-regulation in temperate lines in the current study (Supplemental Fig. S6). 

324 Some domestication and improvement genes (James et al. 1995; Jackson and Hake 

325 1999; Gross and Olsen, 2010) were also identified in the DE analysis, such as tbl 
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326 (teosinte branched 1, AC233950.1_FG002), sul (sugary 1, GRMZM2G138060), and 

327 abphl (aberrant phyllotaxy 1, GRMZM2G035688) (Supplemental Fig. S6). The 

328 presence of domestication or improvement genes in the adaptation process implies 

329 sustaining natural and artificial selection throughout the entire process of maize 

330 evolution. 
331 

332 Only 129 genes were identified both in DE and genome selection analysis. Gene 

333 Ontology analysis of these genes further indicates a diverse range of biological 

334 functions, from response to stimulus to biological regulation and metabolic processes 

335 (Supplemental Fig. S7). Since the two data sets have few genes in common, no specific 

336 pathways were highlighted in the simultaneous analysis of the two data sets. Additional 

337 genes could be found in transcriptome analysis of RNA collected from other tissues 

338 and organs, and a more complete annotation of maize genes would increase the number 

339 of genes in known pathways, which are currently incomplete. 
340 

341 To seek association between DE genes and the divergent traits, transformed expression 

342 values of DE genes were analyzed via DE-GWAS (see Materials and Methods) with 

343 the 1 30 traits that had been found to be significantly different between tropical and 

344 temperate lines. Two hundred and forty five DE genes (9.07% of the total) were 

345 associated (P<lE-3) with 101 traits, including 75 traits overlapping with those detected 

346 by trait-SNP GWAS (Table 1). The 101 DE associated traits included 6 agronomic (cob 

347 color, days to silking, days to tasseling, days to pollen shedding, leaf number and 

348 kernel color), 5 amino acid (Ala, Arg, Asp, Lys, Gly) and 90 metabolic traits 

349 (Supplemental Table SI 2). Seven of the genes identified as associated in the genomic 

350 sequence analysis overlapped with genes identified in the DE-GWAS analysis. 
351 

352 A series of DE genes (c2, prl, al, bzl, whpl) were detected that influence cob color 

353 via the flavonoid biosynthetic pathway (Sharma et al. 2012) (Fig. 5). Cob color 

354 segregated only in the temperate group and is known to be affected by chalcone 

355 synthase (CHS) and maysin synthesis, which are thought to be major contributors to 

356 corn worm resistance (McMullen et al. 2009; Sharma et al. 2012). As can be seen in 

357 Fig. 5, c2 (colorless2, GRMZM2G422750) is one of the main genes of the CHS 

358 pathway and is located upstream of maysin synthesis (Sharma et al. 2012). prl (purple 

359 aleuronel or red aleuronel, GRMZM2G025832) is located downstream of c2, encodes 

360 a CYP450-dependent flavonoid 3 ' -hydroxylase required for synthesis anthocyanin, and 
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361 is involved in naringenin chalcone catabolism (Sharma et al. 2012). al 

362 (anthocyaninless 1 , GRMZM2G026930), located downstream of c2 and prl , is 

363 involved in the production of anthocyanins (Sharma et al. 2012). c2 and al were 

364 significantly associated with flavonoid catabolism (Supplemental Table SI 2). Kernel 

365 color ranked from light to dark (Supplemental Fig. S8A) displayed different 

366 distributions between tropical and temperate subgroups (Supplemental Fig. S8B), and 

367 was slightly negatively correlated with cob color (Supplemental Fig. S8C, P = 

368 1 .3 1E-05). Kernel color was also affected by DE genes within the CHS pathway and 

369 different association patterns were observed between tropical and temperate lines by 

370 DE-GWAS analysis (Supplemental Fig. S8D). 
371 

372 Altered expression conservation 

373 While DE can identify individual differentially expressed genes, altered expression 

374 conservation (Swanson-Wagner et al. 2012) (AEC) reflects the relationship of genes 

375 with their co-expressed group. In total, 389 genes showed the strongest AEC patterns 

376 (expression conservation score >2.5SD; Supplemental Table SI 3). Further analysis 

377 indicated that the average number of genes significantly co-expressed with AEC 

378 candidates (as measured by an absolute Pearson correlation coefficient > 0.3) in 

379 temperate lines was higher (259 genes) than in tropical lines (147 genes; Supplemental 

380 Fig. S9). This indicates that the changes faced in temperate environments may have 

381 enhanced interactions between genes in certain pathways. Among the stronger 

382 relationships, the number of genes with negative correlations (Pearson correlations 

383 coefficient < 0) was higher in temperate germplasm as well (24.4% in temperate lines 

384 vs. 3.2% in tropical lines). Few genes were co-expressed with the same candidate AEC 

385 gene in both temperate and tropical lines (Supplemental Fig. S9), which suggests a 

386 rewiring of the regulation networks in the temperate subgroup during adaptation. 

387 However, the rewiring appears to have been less dramatic during adaptation since 

388 fewer AEC genes were identified for adaptation than for domestication 

389 (Swanson-Wagner et al. 2012). In contrast, the number of differentially expressed 

390 individual genes found in the adaptation analysis was higher than for domestication 

391 (Swanson-Wagner et al. 2012). 
392 

393 As sessile organisms, plants have evolved to integrate endogenous and external 

394 information and employ signal transduction processes to allow growth plasticity and 

395 survive and thrive in their environments. Essential to this plant-environment interaction 
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396 are plant hormones, including auxins, ethylene, abscisic acid, and brassinosteroids, 

397 which play a key role in plant growth, development, defenses and stress tolerance 

398 (Wolters and Jiirgens 2009). Plant hormones modulate gene expression by controlling 

399 either the abundance of transcriptional factors or repressors, or their activities through 

400 post-translational modifications (Dharmasiri et al. 2013). Hormonal cross-talk and 

401 interaction with other plant compounds and environmental factors rely on complicated 

402 signaling networks. These networks generally intersect in central nodes. Six candidate 

403 genes related to plant hormones were observed in our AEC analysis and may act as 

404 central nodes in plant hormonal production. Three (GRMZM2G078480, 

405 GRMZM5G86024 1 and GRMZM2G086773) belong to brassinosteroid biosynthesis 

406 pathway, one (GRMZM5G864847) was an auxin-responsive Aux/IAA family member, 

407 and two (GRMZM2G026131, GRMZM2G390385) were in the pathway producing 

408 ethylene biosynthesis from methionine (Supplemental Fig. S10). These candidates, 

409 along with their co-expressed genes, showed very different networks in temperate vs. 

410 tropical germplasm (Supplemental Fig. S10). This could provide helpful clues to a 

411 deeper understanding of the complex relationship between hormones and their 

412 contributions to maize adaptation. 
413 

414 Many studies have identified the function of F box receptors in hormone controlling 

415 and signaling (Dharmasiri et al. 2005; Kepinski and Leyser 2005; Koops et al. 2011; 

416 Shen et al. 2012). AEC analysis also identified an F-box protein (GRMZM2G031958; 

417 Jia et al. 2013) displaying different regulation patterns, with a series of 462 and 85 

418 highly co-expressed genes observed in temperate and tropical lines, respectively. 

419 Beyond the remarkable difference in network size, several co-expressed enzymes, 

420 including cytochrome-c reductases, NADH-ubiquinone oxidoreductases, peroxidases 

421 and amidase, and some genes functioned in abscisic acid and IAA biosynthesis and 

422 gluconeogenesis were only exists within temperate lines. These results are consistent 

423 with the earlier studies that F-box proteins could play a regulatory role in 

424 glucose-induced seed germination by targeting ABA synthesis (Song et al. 2012). 

425 F-box proteins also play critical roles in seed development, grain filling and response 

426 to abiotic stress(Jain et al. 2007) in crop plants, and co-expressed (with our F-box 

427 candidate) genes involved in cytokinins degradation, cellulose biosynthesis, and 

428 flavonoid and flavonol biosynthesis pathways were observed in the temperate 

429 subgroup but not in the tropical. This F-box gene was also highly co-expressed with an 

430 ethylene-responsive factor-like protein (GRMZM2G169382), an abscisic stress protein 
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431 homolog (GRMZM2G044 1 32), a SAUR37-auxin-responsive family member 

432 (GRMZM2G045243), another two F-box domain-containing proteins 

433 (GRMZM2G 1 1 6603 and GRMZM2G07 1 705 ) and several transcriptional factors 

434 (zmm29, GRMZM2G152862; ethylene-responsive transcription factors, 

435 GRMZM2G474326 and GRMZM2G169382; WRKY1, GRMZM2G018487; 

436 WRKY25, GRMZM2G148561; WOX2B, GRMZM2G3 39751; and more with putative 

437 regulation of transcription functions). In addition, the teosinte glume architecture 1 

438 (tgal) protein and an Early Flowering 4 (EF4)-like protein simultaneously showed 

439 strong relationships with the target F-box gene. These two proteins have been 

440 hypothesized to contribute to maize adaptation to temperate climates in past studies 

441 (Khanna et al. 2003; Ducrocq et al. 2008). The identification of the genes encoding 

442 these proteins made the network more complex, and further studies are needed. Gene 

443 Ontology analysis of all the co-expressed genes in temperate lines revealed an 

444 abundance of genes that respond to temperature stimulus and abiotic and other stress 

445 (Supplemental Table S 14), which were undoubtedly required during maize adaptation 

446 to temperate environments. 
447 

448 Discussion 

449 The process of adaptation was as important a driving factor as domestication in the 

450 creation of a geographically diverse crop, allowing maize to spread into a wide range 

451 of environments around the world. The impacts of domestication and improvement on 

452 the genome and transcriptome have started to be studied (Hufford et al. 2012; 

453 Swans on- Wagner et al. 2012), but adaptation has not been as systematically analyzed. 

454 In this study, large sets of genomic, transcriptomic and phenomic data were used to 

455 analyze the mechanisms of morphological evolution leading to adaptation. Plant 

456 response to the environmental change especially stress may have been the key initial 

457 step towards adaptation to more extreme latitudes (Fig. 2D). A variety of mechanisms 

458 could have contributed to stress response during the maize life cycle, including 

459 changes in seed dormancy, germination, plant architecture, flowering time, and optimal 

460 utilization of resources (nitrogen, water, etc). In the face of new biotic stresses, a series 

461 of resistance mechanisms can also evolve at both the genomic and transcriptomic 

462 levels. Traits allowing the plant to successfully respond to stress are precisely those of 

463 interest to farmers and plant breeders who have achieved improvement of temperate 

464 lines by the selection of such stress tolerance (Tollenaar and Wu 1999). 
465 
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466 It has been suggested that studies scanning for positive selection may incur high 

467 false-positive rates and can be misleading (Pavlidis et al. 2012). In this study, we use 

468 two additional and reliable methods, GWAS and QTL mapping, to provide stronger 

469 evidence for the selected regions and a link to the contributory traits. We also identified 

470 transcriptomic variants contributing to the adaptation process, including differentially 

471 expressed individual genes and evidence for rewiring of co-expression networks. These 

472 candidate genes and regions were found to be functionally related to stress response 

473 and most were associated with the directionally selected traits. While this study 

474 focused on seed transcriptome, the seed-expressed genes and phenotypes provide the 

475 first steps towards a systematic study of the adaptation process and inform our 

476 understanding of the extent to which transcriptome variation influences the 

477 environmental adaptation process. 
478 

479 It has been recently reported that human adaptation is driven primarily by gene 

480 expression changes (Fraser 2013). The present study reveals that transcriptome 

481 regulation was also prevalent in maize adaptation. Plants live in a dynamic 

482 environment, but selection on genomic variation may be too slow to cause the changes 

483 that allow the plant to adapt to a rapidly changing environment (Chen 2007). Selection 

484 on protein coding sequence variation is risky, as most mutations are harmful or lethal 

485 to the organism, and even changes which are beneficial to some of cells or under some 

486 conditions may be harmful to other tissues or under other conditions (Ecker et al. 2012). 

487 Sufficient protein-coding sequence changes at the genomic level would be unlikely to 

488 respond to environmental changes experienced in one or a few generations ; however, 

489 rapid changes in transcriptome regulation can occur quickly and lead to a rapid 

490 phenotypic differentiation (Chen 2007). Transcriptome changes are 

491 resource-economical and are frequently associated with temporally and spatially 

492 related gene -expression patterns, the effects of which can be limited to specific cells 

493 (Carroll 2008; Ecker et al. 2012). Our models, indicating a separation time between 

494 temperate and tropical maize of 3,400-6,700 years BP, suggest that a great number of 

495 changes took place during a few thousand years. The differences in transcription of 

496 individual genes and correlated suites of genes between temperate and tropical maize 

497 can be explained by this hypothesis. Some studies (Cortessis et al. 2012; Ptashne 2007) 

498 have suggested that epigenetic regulation was the main genetic driver causing 

499 regulatory evolution, and that epigenetic modification could also lead to increased 

500 mutation rate (Rideout et al. 1990; Schuster-Bockler et al. 2012). Further study and 
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501 more direct evidence will be needed to better understand the interplay between 

502 epigenetic and genetic processes under selection and provide new insights, and 

503 possibly new mechanisms, for practical plant improvement. 
504 

505 Although our comprehensive study of genomic, transcriptomic and phenomic variation 

506 sheds new light on the process of adaptation in modern maize, much is left to be 

507 uncovered. In particular, changes due to adaptation of temperate maize are inferred in 

508 the current study by comparing temperate maize with tropical maize grown in northern 

509 temperate growing areas. Although similar divergence and changes may have 

510 happened as maize migrated to far southern temperate regions, no South American 

511 maize was studied here and thus this conclusion remains to be confirmed. In future 

512 studies, expression data from more tissues and genotypes need be included and studied 

513 under more environmental conditions to allow a finer dissection of genes and 

514 mechanisms involved in adaptation. It also should be kept in mind that all the 

515 genotypic variation was identified by comparison to the temperate maize reference 

516 genome (B73). If a bias was introduced by this method of polymorphism identification, 

517 it was not considered in this study. Novel assembly strategies taking into account the 

518 variation from more lines could reduce this bias (if present), and exploration of more 

519 variant types, such as presence/absence variation (PAV), should also be considered in 

520 future studies of the adaptation process. Detailed studies on changes in the 

521 transcriptome and in particular, the role of epigenetics, could lead to a clearer 

522 understanding of adaptation, possibly leading in turn to more innovative techniques to 

523 allow plant breeders to apply native trait variation to maize improvement. 



16 



Downloaded from http://biorxiv.org/on September 18, 2014 



524 Methods 

525 Maize inbred lines and collection of genotypic, phenotypic and gene expression 

526 data. 

527 The 368 maize inbred lines included in this study form a global collection including 

528 representative temperate and tropical/subtropical inbred lines listed in Supplemental 

529 Table SI, and additional information about the lines can be found in previous study 

530 (Yang etal. 2011). 

531 

532 The poly(A) transcriptome collected from kernels at 15 days after pollination from all 

533 368 lines was sequenced using 90-bp paired-end Illumina sequencing with libraries of 

534 200-bp insert sizes, and 25.8 billion high-quality reads were obtained after filtering out 

535 reads with low sequencing quality. A total of 1.03 million high-quality single 

536 nucleotide polymorphisms (SNPs) with a missing data rate less than 60% were used for 

537 imputation of missing genotypes. Three pairs of biological replicates (SK, Han21, and 

538 Ye478) were used to evaluate the reproducibility of genotyping by RNA-seq. The 

539 concordance rates were greater than 99% between each pair of replicates (Fu et al. 

540 2013), indicating that the sequencing and SNP calling procedures were reproducible. In 

541 addition, all lines were genotyped via the Illumina MaizeSNP50 array. SNPs generated 

542 by RNA-seq also met high concordance rates with the genotypes determined by the 

543 MaizeSNP50 BeadChip (Fu et al. 2013). Both RNA-seq and MaizeSNP50 data sets 

544 were merged to obtain a total of more than 1 .06 million SNPs; a final data set of 0.56 

545 million SNPs with a MAF of greater than 5% was produced and more detailed 

546 information can be found in a previous study (Fu et al. 2013). 

547 

548 To quantify the expression of known genes with annotation in the B73 reference 

549 genome (filtered-gene set, version 2, release 5b), a total of 28,769 genes corresponding 

550 to mapped sequence reads in more than 50% of the inbred lines were compiled, and 

551 each gene averaged more than 1.5K reads. Read counts for each gene were calculated 

552 and scaled according to RPKM (reads per kilo base of exon model per million mapped 

553 reads). After RPKM normalization, all genes with a median expression level greater 

554 than zero for each sample were included, and the overall distribution of expression 

555 levels for each gene was normalized using a normal quantile transformation (Fu et al. 
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556 2013). The same three pairs of biological replicates were shown to share most 

557 compiled genes (average 95.71%) with high concordance (average person's r=0.87, 

558 (Supplemental Fig. SI 2) in expression quantification between each pair of replicates. 

559 More details on library construction, sequencing, SNP detection, genotype imputation, 

560 positive control of SNP accuracy and quantile normalization of expression is described 

561 inFuetal. (2013). 

562 

563 To obtain agronomic traits (reported in Supplemental Table S5), all inbred lines were 

564 planted with randomized complete experimental design by single replication in 2010 in 

565 four locations (Honghe autonomous prefecture, Yunan province; Sanya, Hainan 

566 province; Wuhan, Hubei province; Ya'an, Sichuan province) and 201 1 in three 

567 locations (Chongqing; Hebi, Henan province; Nanning, Guangxi province). The seven 

568 different locations ranged from 18 to 35 degrees north in latitude and from 102 to 114 

569 degrees east in longitude. Kernel color was measured in five additional trials over two 

570 years (Sanya, Hainan province in 201 1 ; Honghe autonomous prefecture, Yunan 

571 province in 2012; Chongqing in 2012; Wuhan, Hubei province in 2012; Hebi, Henan 

572 province in 2012). All agronomic traits were measured and the Best Linear Unbiased 

573 Predictor (BLUP) values from different environments and years were used for final 

574 analysis (Supplemental Table 2). Maize kernels from each entry in the panel planted in 

575 Chongqing in 201 1 were collected to quantify amino acid content (Supplemental Table 

576 S6), and samples from the panel planted in Yunnan (201 1) and Hainan (2010) were 

577 harvested to measure metabolic traits. Only the metabolites measured in at least two of 

578 the three experiments and showing high correlation (P value<0.05) in the two 

579 experiments were retained to calculate BLUP values for the next analysis 

580 (Supplemental Table S7). 

581 

582 Population structure of 368 inbred lines 

583 The STRUCTURE software package (Pritchard et al. 2000) was used to analyze the 

584 population structure and the EIGENSOFT analysis package (Patterson et al. 2006; 

585 Price et al. 2006) was used to run a Principal Component Analysis (PCA) of the panel 

586 used in the present study (Supplemental Fig. SI 3). Considering the computation time, a 

587 SNP marker subset was used for inferring population structure; the subset of 14,685 

588 SNPs was created by removing adjacent SNPs within 50Kbp intervals. All lines were 
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589 divided into three subpopulations corresponding to stiff stalk (SS), non-stiff stalk (NSS) 

590 and tropical and sub-tropical (TST) clusters by STRUCTURE, using a probability of 

591 inclusion into each cluster of greater than 0.65(Yan et al. 2009). In the final analysis, 

592 there were 133 temperate (103 NSS + 30 SS) lines, 149 tropical (TST) lines, and 86 

593 mixed lines (Supplemental Table SI, Supplemental Fig. S13). The mixed were 

594 excluded from further analyses to allow a clearer comparison between tropical and 

595 temperate features. The population structure was similar to the analysis of the same 

596 lines reported previously using 1536 SNPs (Yang et al. 201 1). 

597 

598 Measuring the genomic changes occurring during the maize adaptation process 

599 To evaluate changes in the maize genome due to adaption, population genetics 

600 statistics including jc (Tajima 1983) and F„(Nei 1977) were calculated within the 

601 differently adapted maize groups. F st was estimated as follows: 

Fst = (H T - H S )/H T 

H s = l- o) tem x {freqA 2 tem + freqBf em ) - u) tst x (freqA 2 st + freqBf st ) 

H T = 1- (o) tem x freqA tem + o) tst x freqA tst ) 2 

~ (^tem x freqB tem + a) tst x freqB tst ) 2 

602 Where H s refers to heterozygosity within subpopulations, and H T refers 

603 heterozygosity in the overall population. The variable o) refers to proportion of 

604 subpopulation based on size, and freqA/freqB is the frequency of allele A or allele 

605 B in each subpopulation. The Synbreed package (Wimmer et al. 2012) of the R-project 

606 statistical package (http://www.R-project.org) was used to compute the linkage 

607 disequilibrium (LD) coefficient, r 2 . The ggplot2 package of R (Wickham 2009) was 

608 used to plot LD decay, as well as all visualization in this study (except as noted, 

609 below). 

610 

611 To identify regions of the maize genome that have undergone selection during the 

612 process of adaption from tropical to temperate climates, a cross -population composite 

613 likelihood approach (Chen et al. 2010) (XP-CLR) was used. The following parameters 

614 were applied to implement the XP-CLR test: the size of the window was 0.005 Morgan; 

615 the maximum number of controlled SNPs within a window was 100; the spacing 

616 between two grid points was 1000 bp; and a corr Level value of 0.7 was used as a 
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617 down-weighted criterion in the weighted composite likelihood ratio test. Adjacent 

618 lOkb-windows from the top 20% of the XP-CLR results were merged into larger 

619 regions, according to Hufford et al. (2012), and only one window lower than 20% was 

620 retained for each region. Regions in the highest 10% of the mean region-wise XP-CLR 

621 values were regarded as having undergone selection. Gene sequences closest to the 

622 maximum XP-CLR value were designated as the most likely candidate genes for 

623 selection, while others within each selected region were considered as possible selected 

624 genes. The linkage map used in the XP-CLR analysis was constructed using an RIL 

625 population generated from the cross of B73*BY804 with 197 individuals described 

626 previously (Chander et al. 2008) and the maize SNP50 chip (Ganal et al. 201 1) was 

627 used to re-genotype the RIL population with 15,285 polymorphic SNPs. The distance 

628 between unmapped SNPs was estimated based on the constructed linkage map and B73 

629 reference genome (version 2). 

630 

631 Estimating the relative divergence time between temperate and tropical lines 

632 Assuming a time scale of teosinte/maize divergence of about 10,000 years, an 

633 FsT-based approach (Schlebusch et al. 2012) was used to estimate a relative divergence 

634 time between temperate and tropical subpopulations, under the assumptions of no 

635 genetic drift, no change in effective population size, and equal generation times in each 

636 lineage. Although violations of these assumptions are probable and may reduce the 

637 accuracy of the estimated divergence times, the estimated divergence times will still be 

638 useful for the purposes of this study and in the general study of maize evolution. We 

639 combined our SNP data with the maize HapMapII (Chia et al. 2012) data and retained 

640 all the SNPs from teosinte (Chia et al. 2012) and maize with the same loci, consistent 

641 alleles, and missing ratio of alleles less than 20% to calculate F st between temperate 

642 maize (TEM) and teosinte (TEO, 0.0668), and between TEM and tropical/subtropical 

643 maize (TST, 0.0453). Divergence time (T, measured in units of 2Ne generations where 

644 Ne is the effective number of diploid individuals) was calculated using F st as according 

645 to the following formula (Schlebusch et al. 2012): 

T= -log(l-F st ) 

646 Most of the SNP genotypes were located within expressed sequences, but the results 

647 should not be affected by this, assuming similar biases between the two comparisons 

648 and sufficient numbers of markers to smooth out unequal biases due to potential 
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649 unequal selection pressures on some loci. Different models were proposed to improve 

650 estimation of the relative divergence time. Assuming the teosinte lines from which we 

651 extracted SNPs were indeed the primitive ancestors (or contained the same sequence 

652 diversity within them as the actual ancestors), then T TEM _ TE0 = 10,000 years; (Fig. 

653 2A) and we can calculate the divergence time between TEM and TST using the 

654 following formula: 

Ttem-tst /Ttem-teo = — l°g (1 — P s ^tem-tst)/(.~ log(l — FstrEM-TEo)) 

655 

656 This resulted in a divergence time of T TEM _ TST =6,700 years BP. Assuming further that 

657 the teosinte lines have undergone a similar selection pressure (Fig. 2B), the analogical 

658 formula can be used to calculate T TEM _ TST divergence time as 3,400 years BP. 

659 However, because it is more likely that the teosinte lines have experienced a selection 

660 level that is not as strong as that of the adaptation process (Fig. 2C), the true 

661 divergence time of adaptation would fall between the two estimated times. Bioclimatic 

662 variables data from WORLDCLIM database (Hijmans et al. 2005) and the DIVA-GIS 

663 (Hijmans et al. 2012) software was used to map the stress (annual mean temperature) 

664 faced by maize during the adaptation process (Fig. 2D). 

665 

666 Analysis of directional selection of phenomic divergence 

667 Q ST — F ST comparison provide us with a method to distinguish population 

668 differentiation of complex polygenic traits as having been caused by natural selection 

669 or by genetic drift (Leinonen et al. 2013). Q ST was estimated for all phenomic traits 

670 including expression traits as follows: 

Qst = ^gb/(Pgb + ^gw) 

671 Where Oq B refers to the genetic component of variance among subpopulations, and 

672 Oq W is the average component of variance within each subpopulation. 

673 To accurately estimate the distribution of mean F ST among tropical and temperate 

674 subpopulations, 10,000 SNPs were chosen randomly from the entire SNP data set and 

675 the calculation was repeated 1,000 times (Supplemental Fig. S14). By applying a strict 

676 outlier definition, we employed a 99% confidence interval (0.025-0.0273) for the F ST 

677 distribution, ensuring a more correct comparison of Qst-Fst-A Qst>Fst value was 

678 regarded as proof of trait divergence outstripping the expectation of a neutral state and 

679 thus of a strong directional selection signal (Leinonen et al. 2013). 
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680 

681 Differential expression analysis 

682 Cyber-T (Baldi and Long 2001), a regularized t-test method that also contains 

683 statistical inferences on experiment-wide false positive and negative levels based on 

684 the modeling of p-value distributions, was used on normalized expression data (Fu et al. 

685 2013) for the identification of statistically significant differentially expressed genes. 

686 Posterior Probability of Differential Expression (PPDE) >0.9999 was determined to 

687 identify differentially expressed (DE) genes in our study. 

688 

689 Characterization of genes displaying altered expression conservation 

690 To identify which genes show Altered Expression Conservation (AEC), a statistic 

691 which reflects the co-expression of genes in a gene network (Swanson-Wagner et al. 

692 2012), the expression data was divided into 2 matrices {E tst and E tem ) based on 

693 adaptation (tropical and temperate) and a co-expression network was created for each 

694 matrix (Rfj 1 and Rfj m ). The hmisc package in R (Harrell 2012) was used to calculate 

695 the Pearson correlation coefficient between each pair of gene expression values within 

696 each subset, (TEM or TST). Hmisc is an efficient algorithm for calculations in very 

697 large data sets, and is calculated as: 

Rff = PCC(El st , Ef st ) 
R?j m =PCC(El em ,E] em ) 

698 for i, j = 1 28,769. Thus, R\f and Rfj m are square matrices with the same 

699 dimensions (28,769 x 28,769). Each value in the two matrices represents an edge 

700 weight in the co-expression network and is the measured similarity between expression 

701 profiles of paired genes. Although the two matrices have identical dimensions, the 

702 distribution of values in each matrix may differ because of unequal sample sizes. Thus, 

703 to compare the two co-expression networks more accurately, the distributions were 

704 normalized by subtracting the mean and dividing by the SD to obtain a standard 

705 normal distribution. An expression conservation (EC) score was calculated as the 

706 Pearson correlation coefficient between gene profiles in the two co-expression 

707 networks as described by Swanson-Wagner et al. (2012): 
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708 Where Ef em and E[ st were represented by the i-th rows in the temperate and tropical 

709 co-expression matrices, respectively. AEC genes were selected using a z score 

710 (Swanson-Wagner et al. 2012) to calculate each EC value as follows: 

EC - n 

z = 

o 

711 Where \i and o were the mean and SD of all the gene's EC scores. The z score cutoff 

712 of altered expression conservation values of genes was set at <-2.5. The software 

713 Circos (Krzywinski et al. 2009) was used for visualizing the results in a circular layout. 

714 

715 Candidate gene annotation and GO enrichment analysis 

716 To more fully explore candidate genes' functions, the annotation resources of 

717 maizeGDB (Lawrence et al. 2008) and the InterPro (Zdobnov and Apweiler 2001) 

718 database were integrated into the analyses. Gene Onotology (GO) enrichment analysis 

719 was maintained by AgriGO (Du et al. 2010) with the Fisher statistical test method (P 

720 <=0.05) and Yekutieli muti-test adjustment method (FDR<=0.05).The GOSlimViewer 

721 of AgBase (McCarthy 2006) was used to implement GO slim analysis, and the updated 

722 GO items of the maize genome were downloaded from Ensembl BioMart (Kinsella et 

723 al. 2011) on April 4th, 2013. 

724 

725 Association and QTL mapping analysis 

726 SNPs within regions that have experienced selection according to the XP-CLR 

727 approach (24,595 in total) were used in an association analysis with the 111 different 

728 traits using the software package GAPIT (Lipka et al. 2012) and a compressed mixed 

729 linear model. The cutoff for significance for associations was set at 1/n (n is the 

730 number of SNPs used, p <4.06E-05). To better compare trait-SNP GWAS results with 

731 the genes identified following DE analysis, the DE identified genes were transformed 

732 prior to analysis into a discontinuous pseudo genotype with two alleles, one if 

733 expression of the gene for a given line was higher than the median of all lines, and the 

734 other if it was lower. Three recombinant inbred line (RIL) populations were used to 

735 map QTL: BY population: BK (an inbred selected from a tropical landrace with very 
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736 big kernel size) x Yu87-1 (an elite inbred with tropical background used frequently in 

737 Chinese breeding programs), SZ population: SK (an inbred selected from a tropical 

738 landrace with very small kernel size) x Zheng58 (an elite inbred used frequently in 

739 Chinese breeding programs), and YZ population: Yu87-1 (an elite inbred with tropical 

740 background used frequently in Chinese breeding programs) x Zong3 (an elite inbred 

741 used frequently in Chinese breeding programs). The QTL were mapped for three 

742 flowering times traits: days to tasseling (DTT), measured as the number of days from 

743 planting to 50% male flower appearance; days to silking (DTS) or the number of days 

744 from planting to 50% female flower appearance; and days to anthesis (DTA), the 

745 number of days from planting to 50% male flower pollen shed. Mapping was done 

746 using WinQTLcart (Wang et al. 2011) and a LOD threshold value for significance set 

747 to 2.5. The flowering time traits are an indication of adaptation and thus used as a 

748 proxy for that trait. 
749 

750 Comparison with DE gene sets from other tissues 

751 Li et al. (Li et al. 2013) conducted RNA-seq on the shoot apex from 2-wk-old 

752 seedlings of the NAM founders. We downloaded the raw data and quantified with the 

753 same procedure used in the present study (Fu et al. 2013) and average expression level 

754 of the two runs of each line was used. The third leaf (L3) after germination of the 

755 NAM founders was also RNA-sequenced by the Springer laboratory, and the RPKM 

756 results were obtained by the author (Eichten et al. 2013). These two studies provided 

757 different tissues and different genetic backgrounds compared to the current study; these 

758 were used to test if genes identified with differential expression in the current study are 

759 tissue-specific or if, as assumed in the current study, will prove to have lasting effects 

760 during maize development and maturity. Furthermore, consistency of expression 

761 differences in different populations can be validated. Genes with zero mapped reads in 

762 more than 60% of the inbred lines were excluded and the overall distribution of 

763 expression levels for each remaining gene was normalized using the normal quantile 

764 transformation (Fu et al. 2013). Lines with unmixed temperate (NSS+SS) and tropical 

765 backgrounds were used to call DE genes (P value < 0.05) with the same method (Fu et 

766 al. 2013). To test if the number of overlapping DE genes (Supplemental Fig. S3) is 

767 significant between the current study and the two published experiments, random 

768 subsamples of genes were chosen from each comparison pair (using same the number 

769 for each subset as the number of DE genes identified by each study); this simulation 
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770 was repeated 10,000 times to create a distribution of random overlap DE comparisons 

771 (Supplemental Fig. S3). This distribution was used to test if the observed number of 

772 DE overlaps is in accordance with the simulated normal distribution. 
773 
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774 Data access 

775 There is no newly generated data in this study. All the data used in this study were 

776 published previously, and can be found in the Methods section. 
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801 TABLE LEGEND 

802 Table 1. Summary of SNP-trait and expression-trait association studies 
803 

804 

805 FIGURE LEGENDS 

806 Figure 1. Integrated results of genome-selection (XP-CLR), transcriptome analysis (DE 

807 and AEC) and QTL mapping. © Ten chromosomes of maize. © XP-CLR value: the 

808 top 20% are marked in red, and the bottom 80% in grey.© QTL mapping results for 

809 days to silking (DTS, red), days to tasseling (DTT, green) and days to anthesis (DTA, 

810 blue). © Results of DE and AEC. Red and blue boxes indicate up- and 

811 down-regulated genes in temperate maize (TEM) relative to tropical (TST), 

812 respectively; green boxes indicate AEC genes. Some of the larger boxes are genes 

813 referred to in the text. 
814 

815 Figure 2. Maize dispersion map and divergence time estimations. (A) Divergence time 

816 (T) was estimated between temperate (TEM) and tropical/subtropical (TST) maize, 

817 using extant teosinte (TEO) lines as the ancestor of maize. (B) Divergence time (T) 

818 was estimated using a model supposing a common ancestor to TEM, TST, and TEO 

819 and equal selection pressure on extant TEO lines (leading to T' for teosinte) as on TST 

820 and TEM. (C) Divergence time was estimated between (A) and (B), with the more 

821 likely assumption that TEO lines have experienced a lower level of selection pressure 

822 (and smaller T') than the intensity faced by TEM and TST during the adaptation 

823 process. (D) Maize dispersion map showing environmental difference in annual mean 

824 temperature. Arrows show possible dispersion routes and the numbers beside the routes 

825 indicate the likely time (in years before present) when the dispersion happened, and the 

826 circle (beside a silver star) is the center of maize domestication (Hufford et al. 2012). 

827 Even though there are no inbred lines from South America, the dispersion route to 

828 South America was inferred from a previous study (Wallace et al. 2014). 

829 

830 Figure 3. Examples of validation of the function of genes within selected genomic 

831 regions, using GWAS, QTL mapping, and gene annotations. (A) Gene 

832 GRMZM2G360455 was in a selected region on chromosome 5. (B) The region which 

833 contained GRMZM2G360455 covered a QTL for days to silking (DTS). (C) 
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834 Annotation of GRMZM2G360455 disclosed a CCT domain in this gene which is 

835 related to flowering time. (D) GRMZM2G095043 was in the selected region on 

836 chromosome 1. (E) GRMZM2G095043 was strongly associated with cob color and 

837 naringenin, which is upstream of the anthocyanin and maysin pathways, and causes 

838 changes in cob color. (F). Annotation of GRMZM2G095043 indicates that it contains a 

839 WD40 domain which could control multiple enzymatic steps in the flavonoid pathway. 

840 

841 Figure 4. Transcriptional differential expression analysis. (A) Density plots for the 

842 coefficient of variance (COV) for gene expression levels in all lines (black), temperate 

843 lines (blue), and tropical lines (red). (B) Genes with significantly different expression 

844 were used for hierarchical clustering. (C) Enrichment analysis of GO annotation within 

845 up-regulated genes in temperate lines relative to tropical lines. (D) Enrichment analysis 

846 of GO annotation of down-regulated genes in temperate lines relative to tropical lines. 

847 Within the specific GO terms in (C) and (D), MF represents genes of molecular 

848 function, CC represents genes with a cellular component, and BP are genes associated 

849 with biological processes. 

850 

851 Figure 5. Cob color was influenced by the flavonoid biosynthetic pathway in maize. (A) 

852 A simplified flavonoid biosynthetic pathway. Genes in red were found to be associated 

853 with cob color in this study. (B). Genes involved in the flavonoid pathway showed 

854 significantly differential expression between temperate and tropical groups. (C). Three 

855 DE genes (c2 (GRMZG2G422750), prl (GRMZG2G025832), al (GRMZG2G026930)) 

856 had significant association with cob color. 
857 

858 
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1133 Table 1. Summary of SNP-trait and expression-trait association studies 



Associated Agronomic Amino Metabolic Related 



traits" traits acid traits traits genes b 

Genome 100 3 1 96 518 

Transcriptome 101 6 5 90 245 

Overlapped 75 3 1 71 7 

Total 126 6 5 115 756 



1134 "represents the divergent traits which were associated with varied markers (SNPs or DE genes) 

1135 using genome wide association study (GWAS). 

1136 b represents the closest genes (or within genes) of the significant associated markers. 
1137 
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